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Abstract 

Ecological networks are often composed of different sub-communities (often referred to as mod- 
ules). Identifying such modules has the potential to develop a better understanding of the assem- 
bly of ecological communities and to investigate functional overlap or specialisation. The most 

5 informative form of networks are quantitative or weighted networks. Here we introduce an al- 

^O 6 gorithm to identify modules in quantitative bipartite (or two-mode) networks. It is based on the 

qh 7 hierarchical random graphs concept of Clauset et al. (2008 Nature 453: 98-101) and is extended 

1 a to include quantitative information and adapted to work with bipartite graphs. We define the al- 

,-h 9 gorithm, which we call QuaBiMo, sketch its performance on simulated data and illustrate its 

<* io potential usefulness with a case study. 

co u 1 Introduction 

O 12 The ecological literature is replete with references to interacting groups of species within systems, 

j 13 variously termed compartments (May, 1973; Pimm, 1982; Prado & Lewinsohn, 2004), modules 

• • i4 (Olesen etal, 2007; Garcia-Domingo &Saldana, 2008; Dupont & Olesen, 2009), cohesive groups 

. £h 15 (Bascompte et al., 2003; Danieli-Silva^o:/., 2011; Guimaraes Jr et al., 2011) or simply commu- 

/\ i6 nities (Fortunato, 2010). Their attraction, for ecologists, is that they promise a way to simplify the 

j^ 17 description and understanding of an ecological system, by representing not each and every species, 

is but aggregating their interactions and energy fluxes into a more manageable set of modules (e.g. 

19 Allesina, 2009). In the following, we will refer to such aggregated sets of interacting species as 

20 'modules'. Their characteristic hallmark is that within-modvlQ interactions are more prevalent 

21 than between-module interactions (Newman, 2003; Newman & Girvan, 2004; Fortunato, 2010). 

22 In the extreme case, modules are completely separated from each another, and are then typ- 

23 ically called compartments (Pimm, 1982). This strict definition has seen some relaxation (Dicks 

24 et al, 2002), but most recent studies converge on the term module for any identifyable substruc- 

25 ture in interaction networks (Prado & Lewinsohn, 2004; Lewinsohn et al., 2006; Olesen et al, 

26 2007; Ings et al, 2009; Joppa et al., 2009; Cagnolo et al, 2010). 

27 The identification of modules, and the membership of species to modules, has received con- 

28 siderable interest in the physical sciences (as reviewed in extenso by Fortunato, 2010). Particularly 
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Figure 1: Bipartite graph of a quantitative pollinator- visitation network (Memmott, 1999). 
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the work of Newman and co-workers (e.g. Newman, 2003; Newman & Girvan, 2004) has practi- 
cally defined the current paradigm of module definition and identification. Algorithms to identify 
modules are "greedy", i.e. highly computationally intensive, relying on some way of rearranging 
module memberships and then quantifying "modularity" until a maximal degree of sorting has 
been achieved (Newman, 2004; Clauset et al, 2004; Newman, 2006; Schuetz & Caflisch, 2008). 
The focus of virtually all these algorithms was on unweighted and one-mode networks (see, e.g., 
Clauset et al, 2008; Lancichinetti & Fortunato, 2011; Jacobi et al, 2012, for a recent examples). 
Unweighted (or binary or qualitative) refers to the fact that only the presence of a link between 
species is known, but not its strength (Levins, 1975; Pimm, 1982). One-mode refers to the struc- 
ture of the community, in which all species are potentially interacting with each other. The typical 
ecological example is a n x n food web matrix, in which entries of 1 depict an existing interaction. 

In recent years, weighted and bipartite interaction networks have become more intensively 
studies. In a weighted network the link between two species is actually quantified (e.g. by the 
number of interactions observed or the strength of the interaction inferred from the data). In a bi- 
partite network the species fall into two different groups, which interact with members of the other 
group, but not within their group. A typical example are pollinator-visitation networks (Vazquez 
et al. , 2009), where pollinators interact with flowers, but flowers do not interact among themselves 
(see Fig. 1). Another well-studied example are host-parasitoid networks (e.g. Morris et al., 2004; 
Tylianakis et al., 2007) or seed dispersal networks (Schleuning et al., 2012). 

While popular among ecologists (Bliithgen, 2010; Schleuning et al., 2012; Poisot et al., 2012; 
Pocock et al, 2012), weighted bipartite graphs are not amenable to any of the existing module- 
detection algorithms for one-mode networks or for unweighted bipartite networks. Existing soft- 
ware uses only one-mode networks or, more precisely, one-mode projections of bipartite networks 
(Guimera et al, 2007; Martin Gonzalez et al, 2012; Thebault, 2013), while other approaches fo- 
cus on the identification of crucial species through quantifications of their position in the network 
(Borgatti, 2006). This lack of an algorithm to identify modules in quantitative, bipartite networks 
is particularly problematic, as such networks find their way into conservation ecological consid- 
erations (Tylianakis et al, 2010) and are the focus of a vibrant field of macroecological research 
(Ings et al, 2009). Furthermore, from a statistical point of view, weighted networks offer much 
more information and are less likely to lead to erroneous conclusions about the system (Barber, 
2007; Scotti et al, 2007; Bliithgen, 2010). 

Here we present an algorithm to identify modules (and modules within modules) in weighted 
bipartite networks. We build on an algorithm provided by Clauset et al. (2008) for unweighted, 
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Figure 2: A simulated 3 -compartment network in random sequence (left), as sorted by a correspon- 
dence analysis (centre) and by the modularity algorithm with default settings (right). 
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one-mode networks and the module criterion developed from Newman's one-mode version by 
Barber (2007). 



2 Modularity algorithms 
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Modules can be interpreted as link-rich clusters of species in a community. An alternative to 
finding and delimiting such modules is to group species by ordination (Lewinsohn et al, 2006). 
Correspondence analysis (CA) of the adjacency matrix is a simple and fast way to organise 
species. Typically, however, correspondence analysis will not be able to identify modules suf- 
ficiently well, even if modules are actually compartments (i.e. perfectly separated: Fig. 2 left, 
centre). The QuaBiMo algorithm we present here, can do so, at least in principle (Fig. 2 right). If 
modules are perfectly separated, with no species interacting with species in another module, they 
are called compartments and will be visible as clearly separated groups of species. It is relatively 
straightforward to implement a recursive compartment detection function, but compartments are 
much coarser than modules and not the topic of this publication. 

One algorithm proposed and available for detecting modules in bipartite networks is due to 
Guimera et al. (2007, called "bipart_w"), which is based on an a one-mode algorithm (Guimera 
et al., 2005). Their approach differs substantially from a truely bipartite algorithm in that they 
project the bipartite network into two one-mode networks (one for the higher, one for the lower 
level) and then proceed identifying the modules for the two levels separately, although they discuss 
the approach later developed by Barber (2007). The Guimera et al. approach was used in several 
ecological applications of modularity (Olesen et al., 2007; Dupont & Olesen, 2009; Fortuna et al., 
2010; Carstensen et al., 2011; Trojelsgaard & Olesen, 2013), although the algorithm does not 
allow for the identification of combined modules (as stated in Barber, 2007; Fortuna et al., 2010). 
Most recently, Thebault (2013) investigated, through simulations, the ability of three modularity 
measures (those of Newman & Girvan, 2004; Guimera et al., 2007; Barber, 2007) to identify 
modules in binary bipartite networks and comes out in support of that of Guimera et al. (2007). 

Finally, Allesina & Pascual (2009) have proposed an approach for one-mode networks. It 
identifies "groups", rather than modules, which reveal more about the structure of a food web 
than modules do, since also their relation towards each other emerges from the analysis. Their 
approach is based on a binary one-mode matrix, however, even when applied to bipartite networks 
(as was done by Martin Gonzalez et al, 2012). 
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2.1 QuaBiMo: a Quantitative Bipartite Modularity algorithm 

2.1.1 Outline 

The new algorithm builds on the Hierarchical Random Graph approach of Clauset et al. (2008), 
which builds a graph (i.e. a dendrogram) of interacting species so that nearby species are more 
likely to interact. It then randomly swaps branches at any level and evaluates whether the new 



97 graph is more likely than the previous one, recording and updating the best graph. The swapping 

98 is a Simulated Annealing-Monte Carlo approach, i.e. sometimes a worse graph is chosen as the 

99 starting point for the next swap, thereby avoiding being trapped in a local maximum. Each node 
ioo of the graph contains the information of whether it is part of a module, so that the graph can be 
ioi transgressed top-down to identify modules. 

102 Our modifications consists of (a) allowing branches between species to be weighted by the 

103 number of interactions observed between them, thereby making the algorithm quantitative; and 

104 (b) taking into account that species in one group can only interact with species in the other group, 

105 rather than the one-mode network the algorithm was initially developed for. Taken together, our 
loe algorithm computes modules in weighted, bipartite networks, based on a hierarchical representa- 
107 tion of species link weights and optimal allocation to modules. 

loa 2.1.2 Terminology 

109 A graph G = (V,E) denotes a set of vertices v G V connected by edges e G E. An edge e connects 

no two nodes, thus e = c(v,,v/), where v,- G V A Vj G V. G is a weighted (= quantitative) graph if 

in each edge e has a weight wGff associated with it (w C R + ). We normalise edge weights so that 

112 Hwew w = \. (For binary graphs w = l/\E\ for all existing edges, where | .| symbolises the number 

113 of elements.) 

114 For bipartite graphs, the vertices V are of two non-overlapping subsets, Vh and Vl (higher and 
us lower level), such that Vh l~) Vl = and for all edges the connected vertices are in different subsets: 
lie Vi G Vh <=^ vj G Vl ( <S=> symbolises equivalence, i.e. if we know v,- is in Vh, vj must be in Vl, 
117 and vice versa). 

us A graph can be represented as a dendrogram D, i.e. a binary tree with the vertices of the graph 

119 G being the tips (or leaves) of the dendrogram D. Thus, any internal split (or vertex) of D defines 

120 a subset of G. The idea of the algorithm is now to find internal vertices of D so that the subset it 

121 defines is a module. 

122 2.1.3 Goal function 

123 The algorithm has to divide G into a set of modules M such that 

124 1 . each module m G M is a connected subgraph of G. (This means each species has to have a 

125 partner.) 

126 2. each vertex v belongs to exactly one module m. (The uniqueness requirement.) 

127 3. edge weights within a module are higher than edge weights outside modules. (The modu- 

128 larity definition.) 

129 To specify point 3 above, Barber (2007) has defined modularity for weighted bipartite networks 

130 as 

Q = ^Z {Aij ~ Kij)d{mhmj) (1) 

131 where N is the total number of observed interactions in the network and A,y is the normalised 

132 observed number of interactions between i and j, i.e. the edge matrix E. The expected value, 

133 based on an appropriate null model, is given in the matrix K (see below). (Without normalisation, 

134 A and K represent the adjacency matrix and the null model matrix, respectively.) The module to 

135 which a species i or j is assigned is m,-, nij. The indicator function 5(m,-,m ; ) = 1 if m, = nij and 

136 if nit / nij. Q ranges from 0, which means the community has no more links within modules 

137 than expected by chance, to a maximum value of 1. The higher Q, the more do the data support 

138 the division of a network into modules. 

139 One crucial point of our modifications of the original algorithm was to assign an indicator 
mo value to each dendrogram vertex to label it as being within a module, or not. To do so, we have 
mi to compute the expected value for each value of A,y in order to be able to evaluate whether the 
142 observed value is lower or higher (the term over which eqn. 1 sums). This step is not required 
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Figure 3: Two possible moves in the swapping of randomly selected vertices v, and v^. The leaves 
depicted here could be actual leaves or, more often, sub-tree and are hence labelled s. The algorithm 
randomly chooses one of these two possible new configurations. 
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if edges are unweighted, since then the expectation will always be the same. For weighted edges 
however, we would expect the edge etj connecting two nodes i and j representing abundant species 
to have a high value of w,y. Similarly, nodes representing rare species could be expected to have 
low edge weights. 

Thus, at every vertex of the tree, the algorithm assembles the module defined by the vertex' 
position (i.e. including all leaves on its branches) and computes the expectation matrix K based 
on the cross product of marginal totals of all species in the module A ; and A, , divided by the sum 
of the number of observed interactions in that module: K = A^A, . Since we normalised all edge 
weight to sum to 1, K is actually a probability matrix. If the vertex gives rise to a module, i.e. if 
Hijem {A-ij —Kij) > 0, this vertex is labelled as a module. We can now sum the contributions of all 
vertices and modules according to eqn. 1 to compute to total modularity of graph G. For a formal 
description of this part of the algorithm, please see appendix A. 



155 2.1.4 Swapping 

156 The algorithm starts with a random dendrogram, where modularity Q is likely to be very low. 

157 Through random swapping of branches and their optimisation, Q increases during a Simulated 

158 Annealing procedure. The algorithm stops when a pre-defined number of swaps did not further 

159 increase the value of Q. 

i6o Random swaps are implemented as exchange of two randomly selected vertices in the dendro- 

i6i gram, subject to the following constraint (Fig. 3). The vertex to be swapped cannot be a leaf. Since 

162 terminal vertices always connect leaves from the two bipartitions V,- and V), thus representing an 

163 interaction, they can be swapped, while their leaves cannot. 

164 After each swap, the modularity of the entire dendrogram is re-computed (for computational 

165 efficiency only those parts affected by the swap). If the new configuration has a higher value of Q 
lee it is stored and becomes the new best dendrogram, otherwise the previous configuration will be 
167 used as the starting point for the next swap. A worse configuration is accepted with the probability 
lea p < eT , where 8Q is the change in modularity from the last configuration to the new one and T 

169 is the current temperature of the Simulated Annealing algorithm. We observed that the algorithm 

170 converges notably faster if the temperature is not decreased monotonously, but rather set back to 

171 the average temperature at which an increase in Q occurs. This is also a better approach in our 

172 case, since we do not know, a priori, how many steps the algorithm will take, or which value of Q 





Figure 4: Network simulation starts by defining the modules (left), then allocating to all links a number 
of interactions drawn from a negative binomial distribution (centre) and finally removing interactions 
in a module and placing them outside (right). High levels of noise, as shown here, yield poorly defined 
modules. Cells with a value of are shown in red. 
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can be obtained. 

Since the hierarchical dendrogram is computed through iterative proposing, evaluation and 
rejecting dendrogram structure in a Markov Chain Monte Carlo approach, Clauset et al. (2008)'s, 
and hence our, algorithm cannot guarantee finding the optimal module configuration. Since the 
algorithm is coded in C++, even billions of MCMC swaps are feasible in a few minutes, yielding 
reasonable results for typically sized ecological networks (see below) at acceptable handling time. 
For large networks, this algorithm can run for hours to days. See section 5 for an example session 
on how to employ the algorithm through R (R Development Core Team, 2012). 



i8i 2.2 Output & nested modules 

182 The algorithm returns an object identifying modules and sequence-vectors for species, as well as 

183 a re-order network ready for visualisation of modules and the modularity Q. 

184 QuaBiMo can be invoked recursively, searching for modules within modules. While such 

185 nested modules become ever smaller and are thus ever faster to detect, there are plenty of them 

186 and hence nesting will typically dramatically prolong the search for patterns. 
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3 Evaluation of the algorithm 



The detection of modules has theoretical limits related to the number of between-module links 
present, the sparceness of the network matrix and the size of the network (e.g. Fortunato & 
Barthelemy, 2007; Lancichinetti & Fortunato, 2011; Lancichinetti et al., 2010). In the follow- 
ing paragraphs we evaluate the QuaBiMo-algorithm for different simulated networks typical in 
size and noise for pollination networks. There is no technical reason why the algorithm should 
not work for much larger networks, too, given enough time for computing a large number of 
dendrogram configurations. Such an evaluation is outside the scope of this study. 
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3.1 Simulations to investigate algorithm sensitivity and specificity 
for noisy network data 

We analysed simulated networks of different noisiness to evaluate the performance of the mod- 
ularity algorithm. We would expect that modules become unidentifiable when the proportion of 
links within modules becomes as low as between modules. We hence simulate networks with in- 
creasing degree of noise by moving, randomly, interactions from within a module to a random 
position in the adjacency matrix not included in any module (Fig. 4). We simulated two sizes of 
networks (30 x 50 and 100 x 400), two levels of filling (achieved through setting the parameter 



203 "size" of the negative binomial distribution to 0.1, "low", or 1, "high" ), and two levels of mod- 

204 ularisation (3 and 10 modules). Each combination was evaluated for seven noise levels (0, 0.05, 

205 0.1, 0.2, 0.3, 0.4, 0.5) and replicated 15 times, yielding 840 different networks. Replicates differ 

206 in the size, position of modules and number of interactions per link. Sizes were maintained at the 

207 same two levels. 

208 Networks were simulated in three steps. First, we defined the size of the matrix and position 

209 and size of the modules. This initial network is a matrix of 0s except for all interactions in a 

210 module, which is thus identified by a block of Is. Then, second, we drew actual interactions for 

211 each link of a module from a strongly skewed negative binomial distribution (with size = 0.05 and 

212 ;U = 2), removed 80% (high filling) or 40% (low filling) of 0-values, and then replaced the initial 

213 Is of the module blocks by these random values. Accordingly, the modules had a connectance (= 

214 filling) of less than 100%. Higher filling of modules generally increases performance. Third, we 

215 randomly drew a proportion of interactions from the module and moved it to randomly selected 

216 columns and rows of these species outside the module. Thereby we effectively added noise to the 

217 network data. There is an upper limit to the third step, where modules become ill-defined. That is 

218 the case when the number of interactions outside modules is as high as inside. 

219 We ran the QuaBiMo algorithm five times on each network, saving the result with the highest 

220 modularity. This was more efficient in finding a good module configuration than running the 

221 algorithm for much longer. For comparison, we also ran the algorithm on a binarised version 

222 of the data. The code for simulations and analysis is available in appendix B; runtime for the 

223 simulatios was approximately two weeks on a standard desktop computer with 32 GB RAM. 

224 Congruence between the original assignment to modules and the one identified by the algo- 

225 rithm was assessed by means of a confusion matrix. Each link existing in the simulated data was 

226 classified as correctly belonging to a module, falsely assigned to a module, falsely not assigned to 

227 a module, or correctly not assigned to a module. The confusion matrix was then summarised as 

228 sensitivity, specificity and accuracy. 

229 3.2 Simulation results: modularity Q in binary and weighted net- 

230 works 

231 Modularity Q was strongly dependent on network size, the amount of noise added and the number 

232 of modules (Table 1). Most importantly, however, our quantitative approach strongly improved 

233 on modularity based on binary data, particularly for large networks (Fig. 5). Deterioration of the 

234 module detection with increasing network size could possibly be compensated for by increasing 

235 the number of swaps before terminating the search (see example session below). The loss of skill 

236 with increasing noise (Fig. 5, right) cannot be alleviated. Here the ability of QuaBiMo to use not 

237 only the binary but the weighted link information is already a dramatic improvement. 

238 In the following paragraphs, we shall only be looking at the results for the weighted networks, 

239 since that is the explicit focus of the QuaBiMo algorithm. 
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3.3 Simulation results: modularity Q 



241 Modularity Q and overall accuracy were affected very similarly by network size, noise and the 

242 number of modules (Table 2). The most prominent effects were those of size, noise and their 

243 interaction, depicted for Q and overall accuracy in Fig. 6. Evidently, larger networks are more 

244 difficult to modularise, as are those with a higher level of noise. 

245 3.4 Simulation results: classification accuracy 

246 While modularity Q gives an indication of how well observed links could be grouped into mod- 

247 ules (with a value of 1 indicating that all links are within and none between modules), we can also 

248 quantify the algorithm's accuracy based on a confusion table. Overall accuracy (= correct classi- 

249 fixation rate) is the proportion of all links correctly placed, i.e. (number of links correctly placed 
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Figure 5: Quality of modularity detection (left: Q; right: overall accuracy) depends on network size 
and type of information (binary or weighted). 



Table 1 : Effect of different simulation parameters on modularity Q and overall accuracy. Sum of 
squares and F-value can be taken as a measure of how strongly these parameters effect modularity. 
No significances are given since a test of an effect is nonsensical for simulations. Information refers 
to binary vs. weighted networks. 'Noise' has seven levels but was analysed as continuous variable. 
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Table 2: Effect of different simulation parameters on modularity Q for weighted networks. Sum of 
squares and F-value can be taken as a measure of how strongly these parameters effect modularity. 
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Figure 6: Effect of noise and network size on modularity Q (left) and overall accuracy (left). 



Table 3: Effect of different simulation parameters on module identification accuracy (weighted net- 
works only). 
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250 into modules + number of links correctly placed between modules)/total number of links. Since 

251 the purpose of the algorithm is the use of weighted network data, we here only present results for 

252 the weighted and not for the binary networks. 

253 The overall accuracy of module detection decreased with increasing noise levels (Table 3), an 

254 effect more pronounced for large networks than for small ones (Fig. 6 right). Again, this interac- 

255 tion probably could have been reduced if more steps until termination were allowed for the larger 

256 networks. 



3.5 Simulation results: sensitivity and specificity 



258 Classification accuracy has two elements: the correct classification of all module links as be- 

259 longing to modules (sensitivity) and the correct identification of between-module links as not 

260 belonging into modules (specificity). For the detection of patterns in networks high sensitivity is 
26i desirable, although this may inflate type II errors (i.e. we may identify modules that do not really 

262 exist). High specificity indicates that links allocated into modules are indeed correct, but possibly 

263 at the expense of not allocating many links to modules overall (leading to inflated type I errors). 

264 Sensitivity and specificity of the QuaBiMo-algorithm were driven by the same factors as over- 

265 all accuracy (Table 4). Increasing noise levels reduced both sensitivity and specificity, as did larger 

266 networks (Fig. 7). 

4 Identifying modules - an example session 

268 The QuaBiMo-algorithm is implemented in C++ and is made available through the open source 

269 R-package bipartite (Dormann et ah, 2009). The most important function is computeModules, 

270 which takes three arguments: the matrix representing the bipartite network data ("web"), a spec- 

271 ification of how many MCMC moves should yield no improvement before the algorithm stops 

272 ("steps", with default \E6) and a logical switch for computing nested modules ("deep", default- 

273 ing to FALSE). The number of steps should be adapted to the size of the network (see previous 

274 sections). We found that Q levels off very soon, once the default of one million is exceeded. 

275 However, we have not extensively trialled this setting for networks larger than that used below. 

276 As a typical analysis we shall use the relatively large (25 x 79) and well-sampled pollination 

277 network of Memmott (1999), which is provided along with the bipartite package: 

278 > library (bipartite) 

279 > mod <- computeModules (web=memmott 1999, steps=lE8) 

280 The evaluation of these two lines will usually take about one minute and perform around 20 
28i million MCMC moves. The resulting object stores the module composition and the likelihood of 

10 



Table 4: Effect of different simulation parameters on sensitivity and specificity of module identifica- 
tion (weighted networks only). 



Sensitivity 


df 


sum of 


squares 
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noise 
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size 
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fill 






0.46 


15 


no.of.modules 






3.01 


99 


noise: size 
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9 


noise: fill 
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Specificitiy 


df 


sum of 


squares 


F-value 


noise 






5.84 
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size 






11.73 
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fill 






0.48 
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no.of.modules 
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13 


Residuals 


813 




15.44 





0.2 ■ 



a 



J] 



30 x 50 
■ 100x400 

0.2 0.3 0.4 0.5 

noise level 



0.2 ■ 



_,_ 



30 x 50 

100x400 



0.2 0.3 0.4 0.5 

noise level 



Figure 7: Effect of noise and network size on sensitivity (right) and specificity (left) of the classifica- 
tion of links into modules. 
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282 the solution found. The modularity value Q of this solution is simply the likelihood value (0.18, 

283 this value may vary between runs; random seeding is not supported): 

284 > modOlikelihood 

285 [1] 0.18 

286 We can now plot the resulting modules to visualise the compartments (Fig. 8 top). 

287 > plotModuleWeb(mod) 

288 To identify nested modules, we choose a lower value for steps (to reduce computation time), thus 

289 also yielding a different module structure at the highest level. Modularity value Q will still be 

290 based on the non-recursive algorithm. 

291 > modn <- computeModules(memmottl999, steps=lE6, deep=T) 

292 To be able to ecologically interpret these modules (Fig. 8 bottom), expert knowledge on the system 

293 is required. The computation of modularity is primarily an explorative tool helping the user to 

294 objectively detect pattern in typically noisy network data. 



321 
322 



5 Modularity Q as a network index 



296 Modularity Q is likely to be correlated with other network metric, as specialisation of module 

297 members is the prime reason for the existence of modules. Across the 22 quantitative pollina- 

298 tion networks of the NCEAS "interaction webs" data base (http://www.nceas.ucsb.edu/ 

299 interact ionweb), Q was evidently highly positively correlated with complementary speciali- 

300 sation H' 2 (Fig. 9). Ecologically, the correlation with specialisation makes good sense. Modules 

301 only exist because some species do not interact with some others, i.e. because they are specialised. 

302 An overall low degree of specialisation is equivalent to random interactions, which will yield no 

303 modules. 

304 Furthermore, the absolute value of Q (like all network indices: Dormann et al, 2009) is de- 

305 pendent on network size (i.e. the number of species) as well as the number of links and the total 

306 number of interactions observed (see also Thebault, 2013). We would thus recommend a null 

307 model comparison (e.g. Vazquez & Aizen, 2003; Bluthgen et al., 2008; Dormann et al, 2009) to 

308 correct the observed value of Q by null model expectation (e.g. by standardising them to z-scores: 

309 z.q = " veJ " )■ In R, this could be achieved by the following code (which will take more than 

Cnull 

310 one hour since we are computing modules in 100 null model networks): 

311 > nulls <- nullmodel(memmottl999, N=100, method="r2d") 

312 > modules. nulls <- sapply (nulls, computeModules) 

313 > like. nulls <- sapply (modules. nulls, function(x) xOlikelihood) 

314 > (z <- (modOlikelihood - mean(like. nulls) )/sd(like. nulls)) 

315 

316 [1] 7.088665 

317 This means that the observed modularity is 7 standard deviations higher than would be expected 
3i8 from random networks with the same marginal totals (representing abundance distributions of 

319 plants and pollinators). Since z-scores are assumed to be normally distributed, values above rs 2 

320 are considered significantly modular. 



5.1 Using modularity to identifying species with important roles in 
the network 



323 Guimera et al. (2005) and Olesen et al. (2007) propose to compute standardised connection and 

324 participation values, called c and z, for each species to describe their role in networks, where 
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Figure 8: Interaction matrix featuring modules for the data of Memmott (1999). Top: Modules iden- 
tified by QuaBiMo (with steps=lE10, running for several hours; Q = 0.30). Darker squares indicate 
more observed interactions. Red boxes delineate the seven modules. (Note that results may vary be- 
tween runs.) In the central module yellow Asteraceae feature heavily, while a possible ecological 
cause pattern for the other modules is less apparent. Bottom: Nested modules based on a recursive 
call of QuaBiMo. Module arrangement is slightly different from top, since the algorithm is stochastic. 
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Figure 9: Modularity (Q) is highly correlated with specialisation H' 2 (Bliithgen et al, 2006) across 
22 pollination networks. Names refer to network data sets in bipartite which were taken from 
http : //www . nceas . ucsb . edu/interactionweb. 

c refers to the between-modules connectivity (called "participation coefficient" P by Guimera 
et al., 2005) and z refers to within-module degrees. Both are computed on the number of links 
and are not weighted by the number of interactions per link. Guimera et al. (2005) suggest critical 
values of c and z of 0.625 and 2.5, respectively. Species exceeding both of these values are called 
"hubs" because they link different modules, combining high between- with high within-module 
connectivity. 

In the case of the pollination network of Fig. 1, c-values range between and 0.78 (with 23 
of 79 pollinators and 13 of 25 plant species exceeding the threshold of 0.625); z-values range 
between —1.21 and 5.00 (with two pollinators but no plant species exceeding the value of 2.5: 
Fig. 10). Put together, only the syrphid Syritta pipiens (and hawkbit Leontodon hispidus almost) 
exceeded both thresholds and would thus be called a "hub species". As can be seen in Fig. 8, this 
syrphid is relatively rare but clearly not randomly distributed over the six modules, thus linking 
modules three, five and six (from the left). In contrast, Leontodon hispidus is a common plant 
species, visited by many different pollinators, and it actually links all modules with the exception 
of module two. 

To objectively define this threshold one could run null models of the original network and 
employ 95% quantiles as critical c- and z-values. For the pollinators in the network of Fig. 1 
these would be 0.67 (±0.039) and 1.45 (±0.220), respectively, based on 100 null models (for 
the plants: c critica i = 0.72 ±0.036 and z critica i = 1.78 ±0.297; Fig. 10 left). While this has little 
effect for plant species (except for moving Leontodon hispidus across the threshold), three more 
pollinators would become hub species (the common hoverfly Episyrphus balteatus, the tachinid 
fly Eriothrix rufomaculata and undetermined fly "Diptera spec.22"). 

6 Conclusion 

We here presented an algorithm to compute modularity Q and detect modules in weighted, bipar- 
tite networks. In a preliminary analysis, this approach was able to identify meaningful ecological 
modules in frugivore networks (Schleuning et al., in submission). Because it uses the strength 
of links as quantitative information, this approach should be much more sensitive, and also more 
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Figure 10: Connection (c) and participation (z) values for pollinators (left) and plants (right) in the 
network of Memmott (1999). Dashed black lines indicate critical values according to Olesen et al. 
(2007), those in grey 95% quantiles from 100 null models (see text). 

specific, than current binary algorithms. By making the algorithm easily available we hope that 
network ecology will benefit from new insights into the structure of interaction networks. 
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Appendix A : Formal definition of the identification of mod- 
ule vertices 

474 Consider an edge (i,j) G E with weight W{j representing the strength of interaction between ver- 

475 tices i and j. In a bipartite graph G maintaining for each vertex its original sum of edge weights, 

476 but disregarding the modular structure of G, the weight vv,y of the edge between vertices i and j is 

477 given by 

(2) 
0, else. 

478 Therefore, the difference of edge weight and expected edge weight 

w 'ij = W U ~ w ij ( 3 ) 

479 is positive, if within module, and negative, if outside module. 

480 

481 Therefore, the algorithm attempts to find the best trade-off between a maximum sum of w' 

482 within modules and a minimum sum of w' outside. 

483 Given a division of V into a set of non-overlapping subgraphs C, we define 



8(C) 



where 



2L 12 $c(i,j) x W^ — (I — Sc{i,j)) x w'^ , if Vc G C: c is connected graph 

iev A jev B " ' (4) 

— °°, else, 



8c(iJ) = { (5) 




485 Obviously, g(C) has to be maximized in order to find the best division of V into modules C. For 

486 achieving this goal, we modify the algorithm of Clauset et al. (2008). 

487 Let D be a binary tree with arbitrarily connected internal vertices v G Vi nte m and with n leaves 

488 representing the vertices of G and initially arranged in an arbitrary order. A module c within D 

489 is defined as the set of leaves of the sub-tree rooted at an internal vertex v meeting following 

490 requirements: 

491 I v has at least one child being a leaf. 

492 II No ancestor of v has a child being a leaf. 

493 III c n Va 7^ A c fl Vb 7^ 0, i.e. there is at least one vertex va € Va and at least one vertex vb € Vb 

494 within c. 

495 Due to requirement I it is obvious that there are at most mm(|Vyi|, |Vg|) modules. Note that due to 

496 requirement II on each path from the root of D to a leaf there is exactly one internal vertex shaping 

497 a module. For convenience, we will use the term 'module vertex' for this kind of vertex. 

498 Each internal vertex v is assigned the information r v whether it is the root of a sub-tree of D 

499 representing a module or whether it is below or above such an internal vertex. Let r v = 1 if v is 

500 above a module vertex, r v = if v is a module vertex itself and let r v = — 1 if v is below a module 

501 vertex. 

502 Additionally, each internal vertex v is assigned its contribution g v to g(C) 

+ I I4> i^v<0A £ J>y>0 

- L L w 'ij> ifr v = i (6) 

— oo ; else, 

503 where «Sf v is the set of leaves of the sub-tree rooted at the left child of v and, analogously, M v is 

504 the set of leaves of the sub-tree rooted at the right child of v. 

505 For C given by the current state of D, g(C) can now be rewritten as 

8(C) = £ gv ■ (7) 

V t V intern 

506 In order to compute max(g (C) ) , the subtrees of D have to be re-arranged. The algorithm therefore 

507 randomly selects an edge e of D connecting two internal vertices v,- and vy. Let w.l.o.g. e be the left 

508 edge of Vj connecting it to its child v,-. Then there are three subtrees Jzf v ., & Vi and £% Vj originating 

509 from Vi and vy, respectively, and two possible rearrangements a and j3 (Fig. 3) of which one is 

510 chosen randomly and simulated. In re-arrangement a, sub-trees S% Vi and 2% v . are permuted, in 

511 rearrangement /3 sub-trees Jz? Vi and & v .. The change dg in g(C) resulting from the rearrangement 

512 is computed according to r Vj and r v . 
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